DiscoverLatent Space: The AI Engineer PodcastOutlasting Noam Shazeer, crowdsourcing Chat + AI with >1.4m DAU, and becoming the "Western DeepSeek" — with William Beauchamp, Chai Research
Outlasting Noam Shazeer, crowdsourcing Chat + AI with >1.4m DAU, and becoming the "Western DeepSeek" — with William Beauchamp, Chai Research

Outlasting Noam Shazeer, crowdsourcing Chat + AI with >1.4m DAU, and becoming the "Western DeepSeek" — with William Beauchamp, Chai Research

Update: 2025-01-26
Share

Description

One last Gold sponsor slot is available for the AI Engineer Summit in NYC. Our last round of invites is going out soon - apply here - If you are building AI agents or AI eng teams, this will be the single highest-signal conference of the year for you!

While the world melts down over DeepSeek, few are talking about the OTHER notable group of former hedge fund traders who pivoted into AI and built a remarkably profitable consumer AI business with a tiny team with incredibly cracked engineering team — Chai Research. In short order they have:

* Started a Chat AI company well before Noam Shazeer started Character AI, and outlasted his departure.

* Crossed 1m DAU in 2.5 years - William updates us on the pod that they’ve hit 1.4m DAU now, another +40% from a few months ago. Revenue crossed >$22m.

* Launched the Chaiverse model crowdsourcing platform - taking 3-4 week A/B testing cycles down to 3-4 hours, and deploying >100 models a week.

While they’re not paying million dollar salaries, you can tell they’re doing pretty well for an 11 person startup:

The Chai Recipe: Building infra for rapid evals

Remember how the central thesis of LMarena (formerly LMsys) is that the only comprehensive way to evaluate LLMs is to let users try them out and pick winners?

At the core of Chai is a mobile app that looks like Character AI, but is actually the largest LLM A/B testing arena in the world, specialized on retaining chat users for Chai’s usecases (therapy, assistant, roleplay, etc). It’s basically what LMArena would be if taken very, very seriously at one company (with $1m in prizes to boot):

Chai publishes occasional research on how they think about this, including talks at their Palo Alto office:

William expands upon this in today’s podcast (34 mins in):

Fundamentally, the way I would describe it is when you're building anything in life, you need to be able to evaluate it. And through evaluation, you can iterate, we can look at benchmarks, and we can say the issues with benchmarks and why they may not generalize as well as one would hope in the challenges of working with them. But something that works incredibly well is getting feedback from humans. And so we built this thing where anyone can submit a model to our developer backend, and it gets put in front of 5000 users, and the users can rate it.

And we can then have a really accurate ranking of like which model, or users finding more engaging or more entertaining. And it gets, you know, it's at this point now, where every day we're able to, I mean, we evaluate between 20 and 50 models, LLMs, every single day, right. So even though we've got only got a team of, say, five AI researchers, they're able to iterate a huge quantity of LLMs, right. So our team ships, let's just say minimum 100 LLMs a week is what we're able to iterate through. Now, before that moment in time, we might iterate through three a week, we might, you know, there was a time when even doing like five a month was a challenge, right? By being able to change the feedback loops to the point where it's not, let's launch these three models, let's do an A-B test, let's assign, let's do different cohorts, let's wait 30 days to see what the day 30 retention is, which is the kind of the, if you're doing an app, that's like A-B testing 101 would be, do a 30-day retention test, assign different treatments to different cohorts and come back in 30 days. So that's insanely slow. That's just, it's too slow. And so we were able to get that 30-day feedback loop all the way down to something like three hours.

In Crowdsourcing the leap to Ten Trillion-Parameter AGI, William describes Chai’s routing as a recommender system, which makes a lot more sense to us than previous pitches for model routing startups:

William is notably counter-consensus in a lot of his AI product principles:

* No streaming: Chats appear all at once to allow rejection sampling

* No voice: Chai actually beat Character AI to introducing voice - but removed it after finding that it was far from a killer feature.

* Blending: “Something that we love to do at Chai is blending, which is, you know, it's the simplest way to think about it is you're going to end up, and you're going to pretty quickly see you've got one model that's really smart, one model that's really funny. How do you get the user an experience that is both smart and funny? Well, just 50% of the requests, you can serve them the smart model, 50% of the requests, you serve them the funny model.” (that’s it!)

But chief above all is the recommender system.

We also referenced Exa CEO Will Bryk’s concept of SuperKnowlege:

Full Video version

On YouTube. please like and subscribe!

Timestamps

* 00:00:04 Introductions and background of William Beauchamp

* 00:01:19 Origin story of Chai AI

* 00:04:40 Transition from finance to AI

* 00:11:36 Initial product development and idea maze for Chai

* 00:16:29 User psychology and engagement with AI companions

* 00:20:00 Origin of the Chai name

* 00:22:01 Comparison with Character AI and funding challenges

* 00:25:59 Chai's growth and user numbers

* 00:34:53 Key inflection points in Chai's growth

* 00:42:10 Multi-modality in AI companions and focus on user-generated content

* 00:46:49 Chaiverse developer platform and model evaluation

* 00:51:58 Views on AGI and the nature of AI intelligence

* 00:57:14 Evaluation methods and human feedback in AI development

* 01:02:01 Content creation and user experience in Chai

* 01:04:49 Chai Grant program and company culture

* 01:07:20 Inference optimization and compute costs

* 01:09:37 Rejection sampling and reward models in AI generation

* 01:11:48 Closing thoughts and recruitment

Transcript

Alessio [00:00:04 ]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel, and today we're in the Chai AI office with my usual co-host, Swyx.

swyx [00:00:14 ]: Hey, thanks for having us. It's rare that we get to get out of the office, so thanks for inviting us to your home. We're in the office of Chai with William Beauchamp. Yeah, that's right. You're founder of Chai AI, but previously, I think you're concurrently also running your fund?

William [00:00:29 ]: Yep, so I was simultaneously running an algorithmic trading company, but I fortunately was able to kind of exit from that, I think just in Q3 last year. Yeah, congrats. Yeah, thanks.

swyx [00:00:43 ]: So Chai has always been on my radar because, well, first of all, you do a lot of advertising, I guess, in the Bay Area, so it's working. Yep. And second of all, the reason I reached out to a mutual friend, Joyce, was because I'm just generally interested in the... ...consumer AI space, chat platforms in general. I think there's a lot of inference insights that we can get from that, as well as human psychology insights, kind of a weird blend of the two. And we also share a bit of a history as former finance people crossing over. I guess we can just kind of start it off with the origin story of Chai.

William [00:01:19 ]: Why decide working on a consumer AI platform rather than B2B SaaS? So just quickly touching on the background in finance. Sure. Originally, I'm from... I'm from the UK, born in London. And I was fortunate enough to go study economics at Cambridge. And I graduated in 2012. And at that time, everyone in the UK and everyone on my course, HFT, quant trading was really the big thing. It was like the big wave that was happening. So there was a lot of opportunity in that space. And throughout college, I'd sort of played poker. So I'd, you know, I dabbled as a professional poker player. And I was able to accumulate this sort of, you know, say $100,000 through playing poker. And at the time, as my friends would go work at companies like ChangeStreet or Citadel, I kind of did the maths. And I just thought, well, maybe if I traded my own capital, I'd probably come out ahead. I'd make more money than just going to work at ChangeStreet.

swyx<

Comments 
In Channel
Agents @ Work: Lindy.ai

Agents @ Work: Lindy.ai

2024-11-1501:09:53

Agents @ Work: Dust.tt

Agents @ Work: Dust.tt

2024-11-1101:00:06

loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Outlasting Noam Shazeer, crowdsourcing Chat + AI with >1.4m DAU, and becoming the "Western DeepSeek" — with William Beauchamp, Chai Research

Outlasting Noam Shazeer, crowdsourcing Chat + AI with >1.4m DAU, and becoming the "Western DeepSeek" — with William Beauchamp, Chai Research

swyx + Alessio